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Abstract. We present a generalisation of King's symbolic execution 
technique called compact symbolic execution. It is based on a concept 
of templates: a template is a declarative parametric description of such 
a program part, generating paths in symbolic execution tree with regu- 
larities in program states along them. Typical sources of these paths are 
program loops and recursive calls. Using the templates we fold the cor- 
responding paths into single vertices and therefore considerably reduce 
size of the tree without loss of any information. There are even programs 
for which compact symbolic execution trees are finite even though the 
classic symbolic execution trees are infinite. 



1 Introduction 

Classic symbolic execution as proposed by King in 1976 [8] systematically ex- 
plores all real paths in an analysed program. There is typically huge (or even 
infinite) number of real paths even for very small and simple programs. There- 
fore, exploration of the real paths becomes a serious problem. We speak about 
the path explosion problem. 

Compact symbolic execution also explores all real program paths, but in a 
very compact manner. We analyse a given program before we start its symbolic 
execution. We look for those parts of the program, which might produce real 
paths with some regularities in program states along them. Typically, program 
loops and recursion produces these regularities. We analyse the program parts 
independently from the remainder of the program. If the analysis of a part suc- 
ceeds, then a result is a template, i.e. a declarative parametric description of the 
complete behaviour of the analysed part. Therefore, an output from the program 
analysis is a set of templates. Now we can execute the program symbolically with 
the templates. Until we reach some of the successfully analysed program parts, 
we proceed just like in classic symbolic execution. Let us now suppose we have 
just reached such a part. Having a template for the part, we do not need to sym- 
bolically execute interior of the part. We just instantiate the template into the 
end of the current path and then we jump behind the part, where we continue 
with classic symbolic execution again. 

Let us consider a symbolic execution reaching a loop. The execution may 
fork into a huge number of other symbolic executions during the execution of 
the loop. Each such execution has its own path in symbolic execution tree of 
classic symbolic execution. But having a template for the loop, we represent 



all these paths by a single one with the instantiated template. In other words, 
a single path explored by compact symbolic execution may represent a huge 
number of paths explored by classic symbolic execution. And that is the cause 
of the considerable space savings of compact symbolic execution. On the other 
hand, we will see that compact symbolic execution has higher requirements to 
performance of SMT solvers then classic one. 

The worst case for compact symbolic execution is, when we fail to compute 
any template for a given program. Compact symbolic execution then reduces to 
classic one, and we gain no space savings. 

2 Overview 

In this section we give an intuition of compact symbolic execution. For simplic- 
ity of presentation we use the following definition of a program. Although our 
programs are simple they support typical imperative constructs and recursion. 

Definition 1 (Program) A program is a collection of functions and global 
variables. Each function has its own local variables. All program variables and 
functions have different names. Exactly one function is marked as starting one. 
Each function is represented as an oriented graph. Vertices in the graph identify 
program locations, while edges define transitions between them. We distinguish a 
single entry and exit location in each graph. There is no in- edge to entry location 
and there is no out-edge from the exit one. We label edges by actions to be taken 
when moving between connected locations. An action can be 

(1) An assignment of the form <variable> :=<expression> , 

(2) Call by value statements 

(a) <variable>:=<function-name>(<arg-list>) , or 

(b) <function-name> (<arg-list>) 

(3) A return value statement ret, <expression> , 

(4) skip statement, which does nothing, or 

(5) A boolean expression over program variables. 

If an edge e = (it, v) is labelled by one of the actions (l)-(4), then out-degree of u 
is 1. Otherwise, label of e is an action (5), out-degree of u is 2 and its out-edges 
are labelled by boolean expressions 7 and -17. No action (2) can reference the 
starting function and no entry nor exit location is incident with an edge having 
an action (2). Each function f is assigned a unique global variable retf used 
for actions (2a) to save a return value being later assigned to the destination 
variable. And for simplicity we do not consider pointer arithmetic nor heap allo- 
cations. We prevent invalid operations in actions (like division by zero, etc.) by 
branchings into error locations. An error location is any location with a single 
out-edge heading back to that location and it is labelled with skip action. 

We can see an example of a program at Figure Q] (a). The depicted function 
linSrch returns the least index i into the array A such that A [i] ==x. If x is not 
in A at all, then it returns -1. 
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We first briefly describe classic symbolic execution as proposed by King [8]. 
Instead of passing concrete data into parameters of the starting function, we pass 
symbols from a set {ao, a\, . . .}. Let us suppose we pass symbols ao and ai to 
variables a and b respectively. After executing an action c:=2*a+b the variable 
c will contain a symbolic expression 2ao + a\ as its value. Symbolic memory is a 
function 9 from program variables to a set of symbolic expressions. We further 
maintain a boolean symbolic expression tp called path condition. It represent a 
complete identifier of a particular program path taken during an execution, tp is 
initially true and it can be updated at program branchings. Let 9 be a symbolic 
memory having 9(a) — ao, 9(b) = a± and 9(c) = 2ao + a\ and let c-a>2*b and 
c-a<=2*b be actions of out-edges of an branching location. For the first action we 
proceed as follows. We evaluate the action in 9. The result is a boolean symbolic 
expression uq + a% > 2a\. If tp — > (ao + a\ > 2a±) is satisfiable, we update 
<p to <p A (ao + «i > 2a±) and we continue the execution by crossing the edge 
having the action. Then we proceed similarly for the second action. Note that 
if both implications are satisfiable, we fork the execution into two parallel and 
independent executions. Besides a symbolic memory and a path condition we 
commonly have a call stack 5 and we also need to identify a current program 
location I. Putting all the things together we get a program state represented by 
a tuple s = (9, tp, S, I). Note that we understand a call stack record as pairs (a, I), 
where I is a return location and a is a restriction of a symbolic memory to local 
variables. Further, we commonly describe the symbolic execution of a program 
by a tree structure called symbolic execution tree. Vertices of the tree are related 
to program locations visited during the execution and edges reflect transitions 
between the locations. Each vertex of the tree is labelled by a related program 
state. But instead of labels T and F for branching edges (as proposed by King), 
we label them by evaluated actions of the branching edges. Figure[T](b) depicts a 
part of symbolic execution tree of the example program from Figure [1] (a) (with 
omitted program states labelling the vertices) . Please ignore grey regions in the 
tree for now. We assume that classic symbolic execution of the program started 
with an initial symbolic memory 9 = {(i, ao), (n, a\), (x, 02), (A, 03)}. 

We often use the following dot-notation to access elements of tuples. If s = 
(9, tp, E, I) is a program state, then s.9 denotes its symbolic memory, s.tp denotes 
its path condition, s.S is its call stack and s.l is a current program location. 
Further, if u is a vertex of symbolic execution tree, then u.s denotes program 
state labelling the vertex. And instead of u.s.9, u.s.tp, u.s. 5 and u.s.l we simply 
write u.9, u.tp, u.E and u.l. Finally, if S is a call stack then we use dot-notation 
to access record at the top of the call stack. So, for example S.l denotes return 
location of record at the top of S. 

Symbols {ao, ai, . . .} in classic symbolic execution represent input values 
to whole program. We generalise this concept to allow independent symbolic 
execution of parts of an analysed program independently to the remainder. Each 
such a part uses the symbols {ao, a\, . . .} relative to a chosen entry location to 
the part. Then using a composition of program states (defined later) we can 
express any run of classic symbolic execution as a composition of program states 
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resulting from analyses of the parts. Let s — (0, tp, 3, 1) be a program state 
resulting from a symbolic execution from a program location (e.g. the entry 
location of the starting function), up to an entry location I of an independently 
analysed program part. Let s' — (9' , <p>' , 3', I') be a program state resulting from 
the analysis of the part, i.e. s' represents a symbolic execution from the entry 
location I to some exit location V from the part. Then s o s' — (9 o 9',cp A 
9{tp'), So (9o3'), V) is composed program state representing symbolic execution 
from l to I' through the analysed part (entered in location I). We can see that 
composition of program states is implemented as composition of their individual 
components. We discuss very details of these operations in Section [3] Only note 
that composed path condition is ip A 9(p'} rather then tp A p 1 . This is because 
p' may contain some symbols. But they are related to the entry location I of 
the analysed part and not to the location Iq. Therefore, we have to compose <p' 
with 9 first to express ip' in terms of symbols relative to location ^o- We do the 
similar effect of shifting symbols from location I to lo in the compositions 9 o 9' 
and 9 o 3'. 



Fig. 1. (a) A program with a function linSrch(A,n,x) . (b) Symbolic execution tree 
of function linSrch. (c) Compact symbolic execution tree of function linSrch. 



In symbolic execution tree at Figure [T] (b) there is a single path highlighted 
by a sequence of grey regions. Vertices in each region are related to the same 
sequence of program locations: b, c, d, b. Moreover, we enter the path at vertex 
referencing location b and we can leave the path either by stepping into a vertex 
referencing location e or into a vertex referencing location /. Let us denote the 
entry vertex into the path as bo and the exit vertices from the path referencing 
locations e and / as eo, e±, . . . and /o,/i, • • • respectively being indexed from the 
top down. Our goal is to completely eliminate the path in grey from the tree, 
while still representing all real program paths. One way to do so is to represent 
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whole the path by a single vertex, b say, with two direct successors. The first 
successor, e say, represents all the exit vertices from the path and the second, 
/ say, representing all the exits /j. Note that names of the vertices b, e and / also 
represent program locations they reference. We label the vertex b by the program 
state labelling 6 - But the question is what program states we should assign to 
the vertices e and /. Note that two different vertices e, and ej may be labelled 
by different program states. So, for the vertex e we need to introduce a program 
state e.sJWfl, parametrised by a parameter k, such that each program state e.;.s 
can be equivalently expressed by e.s[/t], when k is substituted by some number 
v. Of course, for different states ei.s and ej.s there are different numbers, say Vi 
and Vj, for parameter substitution. We similarly need a parametrised program 
state /.sfl/c] for the vertex /. We compute the states e.s[/c] and /.sfl/c] before 
we start symbolic execution of the program from the Figure Q] (a) by analysing 
the following its part. The part consists of all the locations b, c, d, e, / discussed 
above and of all the edges between them. Note that the sequence b, c, d, b of 
locations forms a cyclic path inside the analysed part. This cycle is actually the 
source of the path in grey regions. Nevertheless, we want to describe program 
states at exits form the part. The exits from the part are target vertices of those 
edges of the part, which do not belong to the cycle. Therefore, locations e and 
/ are the exits from the part. We also identify the location b as entry location 
into the part, since we can enter the part by stepping into location b. The part is 
completely defined now. We analyse it independently from the remainder of the 
program. It mainly means that if we use some symbols on in the analysis, then 
they are related to the entry location b of the part and not to the entry location 
of the whole program. At this point we are more concerned about formulation 
of a result from the analysis and its usage then the analysis itself. Therefore, 
we postpone its description to Section [5] We assume here that key properties 
e.s[/c] and /.sfl/c] from the analysis are already computed, so we may formulate 
an output from the analysis of the part as the following template 

t = (b, 2, {(9 e , v? e , [], e)[ K ], (fi f ,<pf, 0, /)[*]}), 

where b is the entry location to the analysed part, the number 2 identifies number 
of following parametrised program states and the remaining two tuples are the 
parametrised program states e.sflAc] and /.s[kJ respectively. Note that [] identifies 
empty call stack. The template contains all the information we need to build 
compact symbolic execution tree, where the path in grey is folded as described 
above. 

Let us symbolically execute the program at Figure Q] (a) with the template 
t. We construct a compact symbolic execution tree during the execution. The 
tree is depicted at Figure Q] (c). We apply classic symbolic execution, until we 
reach the entry location t.b. Let b be the vertex in the tree, when we reach the 
location t.b and let s be the program state b.s. We now instantiate the template. 
Since we have exactly two program states in t, we create exactly two successor 
vertices e and / of the vertex b in the tree. The vertices e and / references lo- 
cations t.e and t.f respectively and they are further labelled by program states 
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s o (t.0 e , t.cp e , [], t.e)[[K] and s o (t.9f 7 t.ipf, [], £./)[«;] respectively. We 
finish the instantiation of t by creating edges (b, e) and (b, /) labelled by sym- 
bolic expressions s.9(t.tp e \K~\) and s.9(t.ip /[kJ) respectively. The situation is also 
depicted at Figure Q] (c). Then wc continue from both vertices e and / inde- 
pendently using classic symbolic execution again. These both executions reaches 
function exit location g in one step and compact symbolic execution terminates. 

Let us now have a look at Figure [5] (a) depicting a program with a function 
count If . The function counts number of elements in array A having values equal 
to x. We show the symbolic execution tree of the program at Figure [5] (b). There 
we can see several sequences of grey regions. According to our experience with the 
previous example we can easily detect that all that paths in grey are generated 
by a single program part consisting of locations c,d,e,f,g and edges between 
them. But there are two cyclic paths tt = c,d, e, /, c and tt' = c, d, /, c inside 
the part. Nevertheless, the grey regions highlight only the cycle tt. So, we ignore 
the cycle tt' and tt is therefore the only cycle we consider. The remainder is now 
obvious. The locations / and g are exits from the part and c is the entry location 
into the part. The analysis of the path (discussed later in Section [5]) computes 
the following template 

t = (c,2 ) {(9 f ,w,\\J)lKl(6 a ,<p a ,\\,g)M}) 

Compact symbolic execution with the template t computes compact symbolic 
execution tree depicted at Figure[2](c). The tree is basically a single link list. Note 
that we instantiate the template each time we reach the location c. But for each 
such instantiation we need a fresh parameter to prevent parameter collisions from 
previous instantiations. We assume we have infinitely many different names for 
the parameters. Therefore, expressions and program states at Figure [2] (c) are as 
follows: % = 8 i c .0(t.<p g [K i D 1 7} = 4-0<^/M>, 4 = 8*.0 o (t.O g ,t.<p g , Q, $)[«,] 
and S ) = sl.9o{t.9 h t. Vfl []J)iK % l 

The sequences of grey regions in the tree at Figured] (b) goes bottom left. 
But imagine they would go bottom right. Then each region would represent a 
sequence of program locations c, d, /, c. If we analysed closer these sequences of 
grey regions, we would realise that there is a part of the program from Fig- 
ure [2] (a) consisting of vertices c,d,f,e,g, where c, d, /, c is the only cycle in the 
part, c is the entry location into the part and locations e and g are exits from 
the part. If we further built a template from the part and run compact sym- 
bolic execution with it, we would also receive a compact symbolic execution tree 
forming basically a single linked list. 

Besides cyclic paths, recursive calls also produce real program paths with 
regularities in program states along them. At Figure [3] (a) there is a recursive 
function linSrchRec which is equivalent to the function linSrch discussed be- 
fore. Symbolic execution tree of the function is depicted at Figure [3] (b). The 
root of the tree is the left-most vertex referencing program location a. There 
are two sequences of grey regions. The top sequence represents recursive calls, 
while the bottom sequence represents returning from the calls. We see that top 
sequence goes from left to the right. The bottom sequence goes in the opposite 
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direction. We can further see there is one to one correspondence between regions 
of both sequences. Below each region in the top sequence, there is a single region 
of bottom sequence. Paths in both sequences of regions are connected in the 
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Fig. 3. (a) A program with a recursive function linSrch(A, i ,n,x) . (b) Symbolic exe- 
cution tree of the recursive function linSrch. (c) Compact symbolic execution tree of 
the recursive function linSrch. 



tree. But this is not shown in the figure. The connection happens, when all the 
recursive calls are done and some basic case is executed in the recursive function. 
Then we get to the path of the bottom regions. 

Let us first focus on the path at the top sequence of regions. Vertices in each 
region are related to the same sequence of program locations: a, 6, c, a. Moreover, 
we enter the path in a vertex referencing location a and we can leave the path 
either by stepping into a vertex referencing location / or into a vertex referencing 
location e. If we look at the program (at Figure|3](a)), the sequence a, b, c, a forms 
a cyclic path in it. Of course, the edge (c, a) is not explicit in the program. But 
we consider it as a meta-edge labelled by an action simulating the effect of the 
function call, as defined by action of edge (c, d). We now define a program part, 
say Pi , consisting of the cyclic path, the entry location a and two exit locations 
/ and e. The part represent the phase of recursive calls of the function linSrch. 

Now we similarly analyse the path in bottom sequence of regions. Each region 
repeats the same sequence of program locations g,d,g. The path is entered in 
vertex referencing location g, but there is no exit from the path. The sequence 
g,d,g of locations forms a cyclic path in the program (at Figure [3] (a)). Note, 
that we assume there is an artificial edge (g, d) enclosing the cycle. Action of this 
edge is supposed to simulate the effect of return from the function call, as defined 
by action of edge (c, d). We want to define a program part P2 representing the 
phase of returning from recursive calls. We have the cyclic path and we have 
the entry location g to the part. But there is no exit from the part. Obviously, 
the recursive calls ends in location g, where we leave the function. Therefore, 
our exit location is g and we have the program part P^. Note that if we want 
to formally match the exit location detection algorithm introduced for previous 
examples, we may imagine there is an edge from g back to g and labelled by 
skip action. 
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For the program parts Pi and Pi we compute the following templates t\ and 
t\ as described in the previous examples. 

h = (a, 2, {(Of, iff, 0, /)[«], (0 e , ¥>e, D. e) Ml) 
i 2 = (ff,l,{(^[/s],*rue,D,ff)}). 

Note that the path condition of ti is simply true, since we cannot escape from the 
path. In other words, as there is no branching along the path, the path condition 
cannot be updated from its initial value true. It is important to note, that both 
templates use exactly the same parameter. The use of the same parameter creates 
a link between the number of recursive calls and number of returns from them. 
Having the templates we are able to formulate the template t for the recursive 
function linSrchRec. 

t = (a, 2, {(6 f , iff, [], f)l4, (6 e ,<p e , 0, e)[re]}, s [re], g) 

The template t contains whole the template t\ , but it took only symbolic memory 
9 g and the exit location g from the template t% . 

We are ready to start compact symbolic execution with the template t. Sym- 
bolic execution tree for the program is depicted at Figure [3] (c). First we step 
into the program location a. The tree contains only the root vertex referenc- 
ing location a. The location a is the entry location of t. Hence, we instantiate 
the first part of t (related to phase of recursive calls, i.e. related to t{) into the 
tree. The number 2 in t identifies, that the root will have two successor ver- 
tices referencing locations / and e and they will be labelled by program states 
Sf = (6f,tp f ,[(t,K)] o 0,/)[«] and s E = (0 e , <p e , [(t, re)] o [],e)[rej respectively. 
Note that we omitted composition of these states with the initial program state 
labelling the root. We could do that, since composition of initial program state 
with any other state produces the other state again. Also note that call stacks 
of both states (i.e. []) are composed with a call stack containing a single special 
record of the form (i, re). This type of call stack record is introduced only for 
templates of recursive functions. First of all, this single record represents any 
number of subsequent recursive calls done by classic symbolic execution. And 
the record also saves reference to the template t and the parameter re used in 
the instantiation. We note that edges from the root to its successors are labelled 
by expressions 7/ = £.<p/[re] and j e = £.</> e [re]. Having computed successors of 
the root, we continue by classic symbolic execution independently from both 
of these vertices, until we reach the location g. For both the executions we do 
the same think at the location g. Let us consider execution continuing from the 
successor referencing location /. We need to instantiate the second part of the 
template representing returns from the recursive calls. So, we remove the record 
(t, k) from the top of the call stack, but we take the template t and the parame- 
ter k stored in the record (t, re). In general, between both instantiation parts of 
a given template, there might be executed any code, there can be instantiated 
many other templates and there can even be instantiated the same template 
several times always with different (fresh) parameters. That is why we save the 
template and the parameter in the stack record. Let s\ be a program state of 
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the current leaf vertex of the tree. We create its only successor vertex labelled 
with program state s g = (s±.6 o t.9\m\, s\.^p, [],g)- We see that there are two 
differences between states si and s*. First of all call stack of s g does not contain 
the special record (t, k) as we have popped it from the stack. And second, the 
symbolic memory of Sg is the composition si.0ot.0jK!]. Further classic symbolic 
execution form the vertex terminates, since we are leaving exit location of the 
starting function. We proceed similarly for the other run of symbolic execution 
(from the second successor of the root), where we get the final program state 

8 2 g = (82.0ot.0[Kl82.(p,W,g). 

To summarise, a general scheme for compact symbolic execution of the ex- 
amples above is as follows. We enumerate parts in a given program producing 
paths with regularities in program states along them. Such sources are mainly 
cyclic paths and pairs of cyclic paths representing recursion. For each enumer- 
ated part we compute a template. Then we run compact symbolic execution 
with the computed templates. 

3 Definition 

In this section, we give precise definition of templates parametrised by a single 
parameter. Templates for recursion consists of two parts instantiated indepen- 
dently into symbolic execution tree. These instances share the same parameter. 
We therefore show a process of information passing between different instances 
of the same template. And we further present compact symbolic execution algo- 
rithm using templates with one parameter with possible information exchange 
between instances. We start with basic terms valid for compact symbolic execu- 
tion with any kinds of templates. We assume for the rest of this section that P 
is a program. 

An injective function from a set of all program variables of P to a set of 
symbols {aa, ol\, 012, ■ ■ ■} is an initial symbolic memory of P. For each program 
variable a its symbol (9(a) represents some yet unknown value of that variable. 
So, 0(a) must belong to a domain of a (i.e. 0(a) is of a's type). Further, nu- 
meric symbolic expression is application of operators to numeric constants and 
symbols. Boolean symbolic expression is either an equality or inequality predi- 
cate over numeric symbolic expressions, or an application of logical connectives 
to other boolean symbolic expressions. Symbolic expression is either numeric or 
boolean symbolic expression. We have already given the definition of symbolic 
memory, call stack and program state in Section [5J But we in addition define 
for any program state s = (0, <p, S, I) that 0(a) = 0(a), for each local variable a 
undefined at location Also note that is just a special symbolic memory. 

The pseudo-code of Algorithm [T] represents two algorithms. If we consider 
only unmarked lines, we get algorithm of classic symbolic execution. If we add 
lines marked with □ we get algorithm of compact symbolic execution with tem- 
plates with a single parameter. The lines marked by * are responsible for con- 
struction of symbolic execution tree. Obviously, both classic and compact sym- 
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bolic executions can appear at both versions: with and without construction of 
the tree. 

We now describe the algorithm of classic symbolic execution. At line [21 there 
we create initial program state and then we insert it into a queue Q. The queue 
Q keeps all program states for which we have not been computing successor 
program states yet. Until Q becomes empty, we iterate the loop at lines |SJ- 
[38l At line [7] we detect whether actually processed program state s is final or 
not. If it is not, we compute its successors at line 1321 In short, the function 
computeClassicSuccessors cither executes actions of out-edges from location 
s.l or it resolves return from a call, if s.l is a function exit location. We already 
gave an intuition how to symbolically execute actions at the beginning of Sec- 
tion [2] We further see at line [34] that we discard all successors of s, whose path 
conditions are not satisfiable. Discarded states do not represent real behaviour 
of the program. 

Now we focus on *-version of the algorithm. We create root of the tree labelled 
by the initial program state at line [4] When processing a state s inside the loop 
we take the only leaf in the tree labelled with s at line [33J We compute its 
successor vertices at lines [36] and [37] Note that the successors are labelled by 
successor states of s. 

We have to postpone description of D-version of the algorithm, until we 
have properly defined templates the algorithm uses. The first step toward the 
definition is introduction of parameters and their substitution. 

We distinguish a set {k, t, Ki, t\, K2, T2, . . .} of variables called parameters, 
ranging over non-negative integers. We extend numeric symbolic expressions 
such that they may also contain application of operators to parameters. We 
allow boolean symbolic expression to contain quantification of parameters. We 
further naturally extend symbolic memories, call stacks and program states to 
contain symbolic expressions with parameters. When we want to emphasise that 
k is a set of all parameters appearing in a symbolic expression ip, we denote it 
as <£>[]«;]. And if we want to emphasise that a symbolic expression ip does not 
contain any parameter, we denote it as (^[J. We naturally extend the notations 
above for symbolic memories, call stacks and program states. 

We now describe substitution of parameters. Each function from a finite set 
of parameters to non- negative integers is valuation. Let <^[k], S"[kJ and 

s[k] be a symbolic expression, a symbolic memory, a call stack and a program 
state respectively, K ^ and v be a valuation defined for all parameters in k. 
Then we compute <p\v\ from y>[/c] such that we substitute all parameters in tp 
by related integers in v. We compute B\l>\ from 6\k\ such that we substitute all 
parameters in all the expressions in 9 by related integers in v. Substitution of call 
stack parameter is a bit more complicated, since we introduced the special form 
(t, k) of a stack record in the last example of Section [2] Therefore, to prepare 
ground for stack equivalence, we compute S\u\ from S"[kJ in the following two 
steps: (1) We update each record (a, I) of the call stack S to (cr[i/],i) (note 
that a is basically symbolic memory, only restricted to local variables). (2) Each 
record of the special form it, n) in the call stack form the previous step is either 
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Algorithm 1: executeSymbolically 
Input: P - program to be executed 

d - set of template detectors (only in D-version) 
Output: E - set of final program states 

T - symbolic execution tree of P (only in *-version) 

□ l Let p be a set of all templates detected in P by detectors d 

2 so : = (&, true, [], entry location of the starting function) 

3 Le Q be a queue of program states initially containing only so 

* 4 Create a root vertex of T labelled with s 

5 repeat 

6 Extract the first program state s from Q 

7 if s.l is the exit location of the starting function or an error location then 

8 Insert s into E 

9 else 

10 S := 

□ ll if top(s.S') = (t, k) A s.l — s.S.t.V then /* returning from recursion */ 

□ 12 t := s.E.t 

□ 13 K := S.S.K 

□ 14 Replace all occurrences of the former parameter in t by k 

□ 15 s' := (s.O o t.0[K], s.<p,pop(s.£'),t/') 

□ 16 Insert s' into 5 

□ 17 else 

□ 18 p' := getTemplatesAt (s.l,p) 

□ 19 if p' / then 

□ 20 t := chooseTemplate(p') 

□ 21 k := getFreshParamO 

□ 22 Replace all occurrences of the former parameter in t by k 

□ 23 if t is a recursion template then /* recursive calling */ 

□ 24 foreach i = 1, . . . , t.n do 

□ 25 s' := s o (t.0i,t.tpi, [(*,«)] ot.Si,Ui)[«] 

□ 26 Insert s' into S 

□ 27 else /* t is a general template with one parameter */ 

□ 28 foreach i = 1, . . . , £.n do 

□ 29 s' := s o (t.fljjt.pijt.Sjji.ij)!*;] 

□ 30 Insert s' into 5 

□ 31 else /* applying classic symbolic execution step */ 
32 S := computeClassicSuccessors(P, s) 

* 33 Let u be a leaf of T whose label is s 

34 foreach program state s'gS such that s'.<p is satisfiable do 

35 Insert s' at the end of Q 

* 36 Insert a new vertex v labeled with s' into T 

* 37 Insert an edge (u, v) into T 

38 until Q becomes empty 

39 return E 

* 40 T 
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discarded, if v(k) = 0, or it is replaced by i/(k) records (_L, J_), where symbol 
_L represent any possible content. Therefore, the record (_L, _L) represents any 
possible stack record (the first _L in the record represents any possible content 
of a and the second _L represents any possible program location). 

We often use the following simplified notation. If an expression tp contains 
exactly one parameter k and a {(k,v)} is a valuation, then we write <p\n\ and 
tp\y\ instead of </?[[{k}J and </?[{(«, v)}\ respectively. The notation also applies 
to symbolic memories, call stacks and program states. 

Next we define composition of program states and equivalence between them. 
We also express some basic equivalences for compositions. 

Definition 2 (Composition) Let E = [r , . . . ,r m ] and E' = [r' , ■ ■ ■ ,r' n ] be 
call stacks and s — (9,tp,E,l) and s' = (9', tp', E' , V) be program states. Then 
composite program state s o s' — (9 o 9', ip A 9(ip'}, E o (9 o E'), I'), where 6(ip') 
is a symbolic expression constructed from ip' such that all symbols on in tp' are 
simultaneously substituted by symbolic expressions 9(0~ 1 (a i )), 9o9' is a symbolic 
memory such that for each variable a we have {9 o #')(a) = 9(9' (a)}, 9 o E' = 
[f' , . . . ,f' n ], where each f\ is equal to except the first component being f\.a = 
9or\.a, and E o (9 o E') = [r ,..., r m , f' , . . . , f' n ], . 

Definition 3 (Equivalence) Let tp, ip' be symbolic expressions, 9, 9' be sym- 
bolic memories, E = [r , . . . ,r m ], E' = [r' , . . . ,r' n ] be call stacks and s, s' be 
program states. Then tp = tp' , if p and tp' are either logically equivalent boolean 
symbolic expressions or numeric symbolic expressions such that {tp = tp') = true. 
9 = 9' , if for each variable a we have 9(a.) = 9'(&). E = E' , if m — n and for 
each i G {0,...,™} we have ri.a and r\.a are defined for the same variables 
with equivalent values and ri.l — r\.l. And s = s' , if both s and s' have equal or 
equivalent components. 

When returning from a function call, values of local variables are discarded. 
Therefore, if we have two program states at the same exit location of a function, 
we may restrict equivalence between symbolic memories of these states only 
to global variables. Therefore we define also the following equivalence between 
program states. 

Definition 4 (Equivalence on Global Variables) Let s and s' be program 
states. Then s is equivalent on global variables with s' , written by s = s' , if they 
have equal or equivalent components except one with symbolic memories, where 
for each global variable a we require s.9(a.) = s'.9(a.). 

We summarise basic equivalences between composed program states in the 
following lemma. We do not provide proof since the equivalences is mostly ob- 
vious or easy to check. 

Lemma 1 (Equivalent Compositions) Let s, s' and s" be program states, v 
and v' be valuations of all parameters in s and s' respectively such that uUu' is 
also a valuation, 9, 9' and 9" be symbolic memories and tp andipAip' be symbolic 
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expressions. Then s o (s' o s") e (so s') o a", s[i/] o s'[i/] e (so s')[i/ U i/], 
0o(0'o 0") = (0o 9') o 6", (0 o 0')(p) ee 0(6' (<p)) and 0(iJj) A 6>(V>') = 6(ip A </>')■ 

Before we formulate a definition of templates with one parameter we give 
its intuition. Let us consider a part of the program P with an entry location e 
and n distinct exit locations x%, . . . , x n . We saw in Section[2J that key properties 
for building a template of the part are program states Sijre], . . . , s„[/c] at exit 
locations xi, . . . ,x n . We need to ensure that states Sj[re] correctly represent 
behaviour of the analysed part. King proved [5] that path conditions at leaf 
vertices of symbolic execution tree T of P are satisfiable. Therefore, if Si.ip is 
not satisfiable, then there cannot be a path in T traversing the part form e to x%. 
The exit Xi is thus useless for the construction of the template and we omit it. 
King further showed [8] that for two different leaf vertices u and v of T we have 
u.ip A v.ip ee false. This statement is also valid for program parts. So, we require 
(si.tp A Sj.ip) ee false for all different i and j. We summarise these requirements 
in the following definition. 

Definition 5 (Templates with one parameter) Let T be symbolic execution 
tree of P computed by ^-version of Algorithm^ n > be an integer, I, I', l\, . . . , l n 
be locations in P , re be a parameter, 0[re], 0i[re], . . . , be symbolic memories 

(pi [re], . . . , ¥>nM be satisfiable boolean symbolic expressions such that for each 
i, j £ {1, . . . , n}, i ^ j we have (tpi Aifj) = false and let Si [re], . . . , S* n [re] be call 
stacks. 

A tuple t = (1,11, {(0i, ipi, Si, h), . . . , (0 n, ^n, ^n)}) is a template with 
one parameter re in P , if 

(LI) All the locations l,li,..., l n in t are neither entry nor exit ones. 

(L2) For each path n = uuj in T from any vertex u satisfying u.l — t.l to a leaf, 

there is a vertex w G lj, an index i £ {1, . . . , n} and an integer v > 0, such 

that w.s ee u.s o (t.Oi, t.ipi, t.Si, t.lijlyj. 
(L3) For each vertex u of T , an index i £ {1, . . . , n} and non-negative integer v 

such that u.l = t.l and (u.ip A u.6(t.tpilyj}) is satisfiable, there is a successor 

w of u in T such that w.s = u.s o (t.&i, t.ipi, t.Si, t-h)\v\- 

A tuple t = (I, n, {(9i, tpi, Si, . . . , (0 n , tp n , S n , l n )}, 0, 1') is a recursion 
template with one parameter re in P, if 

(Rl) t.l and t.l' are entry and exit locations of the same function respectively and 
t.l' is the target vertex of an edge with a call action of that function. All the 
locations li, . . . ,l n in t are neither entry nor exit ones. 

(R2) For each path n = uuj in T from any vertex u satisfying u.l — t.l to a leaf, 
there is a non-leaf vertex w G uj, an index i £ {l,...,n} and an integer 
v > 0, such that w.s = u.s o (t.Oi, t.ifi, [(t, re)] o t.Si, ^-^)M- 
Further, if there is the first successor w of w in ir such that w.l = t.l' and 
w.S = w.S , then there is a non-leaf vertex u in a suffix of n starting with 
w such that u.s = (w.0 o t.6\v\,w.ip, u.S, t.l'). 
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(R3) For each vertex u of T , an index i G {1, . . . , n} and non-negative integer v 
such that u.l — t.l and (u.(p A u.9(t.ipi\u\)) is satisfiable, there is a successor 
w of u in T such that w.s = u.s o {t.9i 1 t.ipi, [(t, k)] o t.Si, tJj)[i/J. 

Note that requirements (L2) and (R2) guarantees that no path in T with 
vertices u and v such that u.l = I and v.l — li is suppressed by the state 
(t.Oi, t.tfi, t.Si, And requirement (L3) and (R3) guarantees that pro- 

gram state (t.Oi, t.ipi, t.Si, t-h)\n\ does not produce unreal paths. Also note 
that in requirement (R2) there we use restriction of equivalence to global vari- 
ables for the phase of returning from recursive calls. Since values of local variables 
are not important when returning from a function call, the restriction may help 
to simplify detection of a recursion template. 

We are ready to describe D-version at Algorithm [T] At line [1] there we detect 
templates with one parameter in the passed program P. That is a task for so 
called template detectors. We discuss a possible construction of such a detector 
in Section [5] The only purpose of lines [TT1l3"T1 is to compute successor states of 
a currently processed program state s. Let us first assume the test at line 1111 is 
false. So, we get to line 1181 There we call a system function getTemplatesAt, 
which selects those templates, whose entry locations matches the actual program 
location s.l. If the selection is not empty we may instantiate one of the selected 
templates. A system function chooseTemplate is supposed to choose exactly one 
template t to be instantiated. We may for example choose randomly. We do not 
put any constraints to the selection strategy. To prevent parameter collisions 
we first get a fresh one at line [21] and then we replace the parameter used 
in t by default by the fresh one. Now we have two possibilities. Either t is a 
recursion template or not. In the first case we get to a loop at line [Mj There 
we create t.n successors of the program state s (see line US]). Note that call 
stack of i-th successor state is of the form s.E o [(t, k)] o t.Si. It means that 
the special record is at the position in the stack, when we entered the recursive 
function. The only special record (t, k) in the call stack represents any possible 
number of subsequent recursive calls in classic symbolic execution. The record 
also saves reference to the template t and the parameter k for the later phase 
of returning from the recursive calls. If t is not a recursion template, then it 
must be our general purpose template with one parameter (since we do not 
consider any other kinds of templates in this paper). So we get to line [55] in the 
algorithm. There we also create successors of the program state s (see line |2"!)]) . It 
remains to discuss the computation of successors, when the condition at line[TT1 
is true. The condition says that the location s.l references exit location of a 
function and that there is the special record (t, n) at the top of the call stack 
s.E. In other words, we reached the moment, when we have to return from 
recursive calls. We first retrieve the recursive template and the parameter used 
in the instantiation of t (see lines [T2] and [T3"|) . After substitution of the default 
parameter by the retrieved one, we finish the instantiation of t by computing 
the only successor of the actual state. The successor state represents the effect 
of all the returns from recursive calls done previously. This is ensured by using 
of the same parameter form both phases of the instantiation of the template t. 
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A number of recursive calls therefore matches the number of returns form them. 
Also note that call stack of the successor does not contain the special record. 
We finish the description of the algorithm by the following observation. The 
expressions computing successor states at lines [T5] [35] and [25] precisely match 
corresponding expressions in Dcfinition[5] Note that at line[15]there the call stack 
pop(s.H') must be equal to one of a program state, for which we previously get 
to line [55] And this program state had to be related to the entry location of a 
function causing the recursive calls. 

4 Soundness and Completeness 

In this section we formulate and prove soundness and completeness theorems 
for compact symbolic execution using recursive and general templates with one 
parameter. The theorems say that both classic and compact symbolic execution 
explore the same set of real paths of P. To avoid repetitions we assume for the 
remainder of this section that P is a program, and T and T' are symbolic exe- 
cution trees of the program P computed by *- and □, ^-versions of Algorithm [T] 
respectively. 

Lemma 2 Call stack records pushed at line \25\ of Alaorithm\T\ cannot be adjacent 
in call stacks of vertices ofT'. 

Proof. Follows immediately from requirement for locations of templates in Defi- 
nition [5] and from the fact, that reaching line 1251 requires a processed state must 
reference a function entry location. 

Lemma 3 Let u G T, v! G T' , u'.S ^ [], topCu'.S') = (t,n), u'.l is an exit 

location and u.s = u'.sji/] for some valuation v . Then there are the only direct 
successors w G T and w' G T' of u and vf respectively and they satisfy w.s = 
w'.sluj. 

Proof. Follows directly from Lemma [2] and from the fact that successors of u' 
are computed at line [32] of Algorithm [T] 

Theorem 1 (Soundness) For each leaf vertex e G T there is a leaf vertex 
e G T' and a valuation v of all parameters in e'.s such that e.s = e'.sjf]. 

Proof. Let ir be the path in T from the root to the leaf vertex e. We prove the 
theorem by the following induction: 

Basic case: The root vertices r and r' of T and T' respectively are labelled 
by the same program state So (see lines [2] and [4j . So, r.s = r'.sji'j, for v = 0. 

Inductive step: Let u G 7r, u ^ e, u' be a vertex of T' and v be a valuation 
such that u.s = it'.sji/]. We show, there is a successor w of u in tt, a successor 
vertex w' of u' in T' and a valuation v' such that w.s = u/.s[i/J. And we 
further show there is no vertex v' in the path between u' and w' in T' such 
that successors of v'.s are computed at line [25] There are four possible cases in 
Algorithm [1] for u'.s: 
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(1) We reach line 1251 According to Definition [5] (L2), there is a successor 
vertex w of u in 7r, an index i and a non-negative integer v for k such that 

w.s = u.s o (t.6 i lK],t.<PilK},t.E i lK},t.l i )l{(K,l')}} 

= (u'.«o(^ i [K] s t. ¥ ) 4 [K] ) t.Si[«],Ui))[i/U {(«,»/)}] 

where s' is the i-th direct successor of u'.s computed at line 1251 And since w E T, 
we have w' .ip is satisfiable. Therefore, there is be a direct successor w' of v! in 
T" with w.s = s' . 

(2) We reach line [Ml According to Definition [S] (R2), there is a successor 
vertex w of u in 7T, an index i and a non-negative integer v for k such that 

w.s = u.s o (t.fliM^M, [(*, «)] o t.Si[«],t.li)[{(K, i/)}] 

E U ',Ho((i 1 [4t.^[ K ] 1 [(i, K )]oi.H 1 [4f.i 1 )[{( K ,,)}] 

where s' is the i-th direct successor of u'.s computed at line!25l And since w G T, 
we have w'.ip is satisfiable. Therefore, there is a direct successor w' of it' in T" 
with w.s = s'. 

(3) We reach linefT2l Let tt' be a path in T 1 from the root to the vertex u' . 
According to connections between vertices u' constructed for vertices u along tt, 
there is a predecessor x' of u' in tt' , which pushed (at line [25]) the record being at 
the top of u'.S. Obviously, successors of x'.s are computed at lineHHJ Therefore, 
there is x G n such that x.s = x' .s\v\. According to case (2) there is a successor 
y of x in n and a direct successor y' of x' in tt' such that y.s = y'.sji/]. Note 
that ?/'.s uses the parameter k retrieved from stack u'.S at line 1131 Therefore, 
valuation v defines an integer v = v(k). Also note that u is the first successor of 
y in tt with u.l being an exit location and u.S = y.S. Otherwise we would apply 
this case (3) for some other vertex lying between y' and u' in tt' . Therefore, from 
Definition [5] (R2) there is a non-leaf vertex v in a suffix of tt starting with u such 
that 

v.s = (v,.6ot.6lKl,v,.(p,x.S,t.l')l{(K,v)}} 

I {u'.0M ot.%I,«'.VJ[i/],pop(« / .S)[i/],t.O[{(« > «/)}] 

= (u'.e o t.ei4,u'.(p, pop (u'.h) , t./') h 

where s' is the only successor state of u'.s computed at line 1151 Since v G T, 
then s'.(f is satisfiable and there is a direct successor u' of u' in T" with v'.s = s' . 
And hnally Lemma [3] ensures there are the only direct successors w and w' of v 
and u' respectively, such that w.s = w'.slv}. 
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(4) Otherwise, we reach line [321 Since u.s = u'.sji/] and we apply classic 
symbolic execution step for u' .s, there must be a direct successor w of u and a 
direct successor w' of u' such that w.s = u/.s[i/J. 

Theorem 2 (Completeness) for eacft leaf vertex e' G T" t/iere zs a leaf vertex 
e G T and a valuation i> of all parameters in e'.s such that e.s = e'.sji/]. 

Proof. Let 7r' be the path in T" from the root to the leaf vertex e'. We prove the 
theorem by the following induction: 

Basic case: The root vertices r and r' of T and T' respectively are labelled 
by the same program state so (see lines [2] and [4j . Let us construct a non-empty 
set U of vertices of T such that for each valuation v of all parameters in r'.s 
such that r'.<p[f] is satisfiable, there is u G J7 such that w.s = r'.sji/]]. Obviously 
J7 = {r}, because r'.ip contains no parameter (so r.s = r' .sfv}, for each v). 

Inductive step: Let u' G n', u' ^ e' and U be a non-empty set of vertices of 
T such that for each valuation v of all parameters in u'.s such that is 
satisfiable, there is u G U such that u.s = u' .s\y\. We show, there is a successor 
w' of u' in 7r' and a non-empty set W of vertices of T such that for each valuation 
v' of all parameters in w'.s such that «/.y>[i/J is satisfiable, there is w G W such 
that w.s = w'.sjy'J. And we further show that each w G W is a successor of 
some u (z U and there is no vertex i>' between u' and u/ in 7r' such that successors 
of v' .s are computed at line 1251 There are four possible cases in Algorithm [T] for 
u'.s: 

(1) We reach line[55] Let w' be a direct successor of u! in 7r'. Obviously, u/.s 
is one of the states s' computed at lineHHJ Let i be the index, for which w'.s = s' . 
The formula w'.tp is satisfiable, since w' is in T' (see condition at line . Let v 
be a valuation for which w'.ip is satisfiable. And let v' = v \ {(k, i/)}, where ^ 
is an integer assigned in to the fresh parameter k introduced at line 1211 From 
linel29lwe see that u'.y>|i/'] is satisfiable. Therefore, there is a vertex u G U such 
that u.s = u'.s[i/'|. According to Definition [5] (L3) there is a successor w of u in 
T such that 

w.s = m.so (t.0 i [«],*.¥'»M,<.S'i[K],Ui)[{(«;,i/)}] 

= u'.s^'] o (t.^ M , M , t.Si fnj , t.k ) [{(«, i/) }] 

= (u'.s o (t.^ [«] , t.^ 14 , [«] , t.k))\v\ 
= w'.s\vl 

Therefore, w G W. 

(2) We reach line[24l Let w' be a direct successor of v! in n'. Obviously, w'.s 
is one of the states s' computed at line [25l Let i be the index, for which w'.s = s' . 
The formula w'.ip is satisfiable, since w' is in T' (see condition at line [34]) . Let u 
be a valuation for which w'.ip is satisfiable. And let v' = v \ {(«;, i/)}, where ^ 
is an integer assigned in i/ to the fresh parameter k introduced at line 1211 From 
line[25]we see that u'.y>[i/] is satisfiable. Therefore, there is a vertex u G U such 
that u.s = u'.s[i/'J. According to Definition [5] (R3) there is a successor w of u 
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in T such that 



W.S = U.S O {UiM^il^l [(*,«)] ° *.S'iM,t.ii)[{(K,l/)}] 

= u'.»[l/] o {tAhlt.^ilKl [{t,K)]ot.SilKltM)\{{K,v)}l 

= (u'. S o (tA M , t.w M , [(<, «)] o t.s 4 [«] , t.k))M 

= w'.sluj. 
Therefore, w € W. 

(3) We reach line [121 Let x' be a predecessor of u' in tt' , which pushed (at 
line [25]) the record being at the top of u'.E. Obviously, successors of x'.s are 
computed at line [25] Further, let y' and v' be direct successors of x' and u' in 
tt' respectively. The formula v' .<p is satisfiable, since v' is in T' (see condition 
at line IM)) . Note that v' is the only successor of vl in T". Let v be a valuation 
for which u'.^j is satisfiable. Note that v defines an integer v = v(n) for the 
parameter k retrieved from stack u'.S at line 1131 since y'.s must have already 
used it. From line[15]we see that u'.y>[i/] is satisfiable. Therefore, there is a vertex 
u S U such that u.s = u'.s[i/]. Let it be a path in T from the root to a leaf 
vertex and going through u. According to connections between vertices of sets U 
constructed for vertices u' along 7r', there is a predecessor x of u in 7r, such that 
x.s = x'-sji^j. Since y' is the direct successor of a; in it (i.e. there was computed 
a set W for y'), there must also exist a vertex y £ tt lying between x and it and 
y.s = y'.sft/J. Note that u is the first successor of y in 7r with uJ being an exit 
location and u.S — y.E. Otherwise we would apply this case (3) for some other 
vertex lying between y' and v! in tt' . Therefore, from Definition [5] (R2) there is 
a non-leaf vertex v in a suffix of tt starting with u such that 

v.s = (u£ o t.6lK\,u.tp, x.S, W)1{(k, v)}\ 

k (u'.flMot.flM ) u'. v M,pop(u'.s)M,t.O[{(«,i/)}] 

= (u'.6ot.6lKl,u'.ip,-pop(u'.E),t.l')lyl 
= v'.s\ul 

And finally Lemma [3] ensures there are the only direct successors w and w' of u 
and v' respectively, such that w.s = w' .sfv^. Therefore, w 6 W. 

(4) Otherwise, we reach line[32l Let u be any vertex in U. Since u.s = u'.s[f] 
for some valuation i> for which is satisfiable and since all direct successors 
of both u and v! are computed by classic symbolic execution step, there must 
be a direct successor w of u in T and a direct successor w' of v! in T' such that 
w.s = w'.sjf]. Note that both u'.s and w'.s have exactly the same parameters. 
Therefore, w € W. 

5 Computation of Templates 

In this section we show one possible approach to computation of templates with 
one parameter. We provide detailed description of an algorithm computing a 
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template for a program part with specified cyclic path, entry location, and several 
exit ones. Then we extend concept of the algorithm to computation of recursion 
templates for program parts. 

5.1 Template for Program Part with Cyclic Path 

Let P be a program and let us suppose we have a program part of P with a 
cyclic path, an entry location e and some exit location x (but there can be other 
exits from the part). We show how to compute a symbolic memory ^[kJ, a path 
condition and a call stack ^[k] at the exit location x. The computation 

of remaining parts of resulting template are then straightforward. 

The algorithm proceeds in two steps. First, we compute a program state 
(9, (f, \\,e) resulting from classic symbolic execution of the cyclic path of the 
part exactly once, and a program state (6, tp, S, x) resulting from classic symbolic 
execution of a path from etoi. The second step is to express 0-bJw], (/^[/t] and 
.S^JkJ in terms of the program states computed in the first step. 

The computation of program states (0,cp, [],e) and (0,<p, S, x) requites to 
run classic symbolic execution on the analysed program part. But Algorithm [T] 
can only execute programs satisfying Definition [T] Therefore, we create a new 
program, say P', representing the analysed part. 

We start with a program P' consisting of all variables of P and of all those 
functions of P having at least one location of the cycle. Note that the cyclic path 
of the part may traverse several functions through call sites. We now remove all 
the locations and edges in P', which do not belong to the cycle nor to the path 
from e to x. We assume that x does not belong to the cyclic path, since otherwise 
we can always create its copy outside the cycle. Next we mark the function in P' 
containing the entry location e as the starting function of P' and we set e to be 
the entry location of the function. Then we create a new location e' representing 
the exit location from the starting function. Now we break the cyclic path in the 
entry e such that we redirect the only in-edge of e (belonging to the cycle) to 
e! . And finally we transform x to error location by adding loop edge with skip 
action. 

P' is now a program according to Definition [T] So, we can run unmarked 
version of Algorithm [TJ Note that the algorithm must always terminate for P'. 
Let E be a set of resulting program states. Then \E\ < 2. If there is no s £ E such 
that s.l — e, then we do not create the template for the part, since there is no 
real path around the cycle. If there is no state s 6 E such that s.l = x, then we 
discard the exit x from the consideration for the template, since it is impossible 
to leave the loop through x. Otherwise, E contains exactly two program states, 
which are the states we are looking for. 

Now we show how to express 9 X [ft] , (fix M and S x [k] in terms of the program 
states computed above. Let T be a symbolic execution tree of P, computed by 
♦-version of Algorithm [T] Further, let u be a vertex of T such that u.l — e and 
7T = u . . . u% . . . U2 . ■ ■ u v . . . w be a path in T starting at u, iterating the cycle of 
the part exactly v > times, i.e. all the vertices u, have itj.Z = e, and then it 
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leaves the cycle into the vertex w with w.l = x. We use memory composition to 
express memories of vertices along 7r as follows. 



u\.6 = u.6 o 9 

u 2 .6 = ui.6 o 6 = u.6 o {6 o 6) 

u v .9 = u v -\.6 o 9 = u.9 o ( 9 o ■ • ■ o 9) . 

1 f we denote the composition of i symbolic memories 9 by 9 % , where 6° = and 
= 0, then we have = o 9 % and we get 

w.6 = u.6o(6 v 06). 

We proceed similarly to express path conditions of vertices along it. 

ui.ip = u.ip A u.6(ip) = u.ip A (u.6 o 9°)(ip) = u.ip A u.9{9 a (ip)) 

u 2 -ip = ui.ip A u\.0{<p) = u.ip A u.9{9 a (tp)) A (u.0 o fl 1 )^) = u.ip A u.9(9 a (ip) A 1 (</>}} 
u„.<£ = u„_i.<p A u v -\.6(ip) = u.ip A u.6(6°(ip) A ... A 9 l, ~ 1 (ip)) 

y v ' 

Using the following equivalence 

0°(p) A.-.A^ -1 ^) = 0<i/AVt(0<t<i/-> 6> r (</?)), 
we can write 

= u v .ip A u v .6((p) = u.ip A u.9{9 a (ip) A ... A 9"~ 1 ((p) A 6» 1/ (^>)) 
= u.^ A u.6»(0 <i/AVt(0<t<z/-S- 6> T (^)) A 6>' y (^)). 

SMT solvers do not support memory composition operation appearing in the 
formula w.ip. Therefore, we need an equivalent declarative description of the 
operation. Such a description is a parametrised symbolic memory where 
we require = 9 K , for any k > 0. For a given symbolic memory 9 we compute 
content of #[/c] per variable by applying the following two rules 

0(a) = 0(a) + c, a is of a numeric type, c is a numeric constant of a's type 
9(A) = 0(A), A is of a an array type 

%](A) = e(A) ' 

where expression typeOf<a>(«) represent casting operation of k to a type of 
variable a. If there is a variable, which does not match any of the rules, then 
we fail to compute 0[k]. And we thus fail to compute the template. Obviously, 
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one can provide more rules for more complex symbolic memories. The presented 
rules are only supposed to illustrate the process. 
Having #[kJ we define 

fxM = 0<kAVt(0<t<k^ 8bi(<p)) A01k}(<p) 

s x [ K } = eiK]os, 

and we get w.O = u.9 o X M, w.tp = u.y A u.0((/? x [i/]) and id.S = u.S o (u.0 o 
S x [i/]). Using these equivalences we write w.s = w.s o S x , x)[i/], which 

is exactly the equivalence used in Definition [5] (L2) and (L3). 

5.2 Template for Program Parts Representing Recursion 

Let P be a program, / be a recursive function of P, e and x be entry and 
exit locations of / respectively and let h = (u, v) be an edge of / with an 
action representing recursive call of /. We transform computation of recursion 
template for recursive calling of / into analysis of two program parts P\ and 
Pi with cyclic paths. The cycle of P\ starts at location e and leads to u. We 
then enclose the cycle by an artificial edge whose action simulate an effect of any 
call of /. Let e be entry location of Pi and let Xi,...,x n be its exit locations. 
We compute a template ii = (e,n, {(9i, <pi, Si, xi)[k], . . . , (9 ni (frit ^ni 
for Pi according to algorithm from Section 15.11 Having t\ we can express the 
resulting recursive template t as follows. 

t = (e,n, {(0i,<£i,Si,xi)[/c]], . . . , {9 n ,(p n ,S n ,x n )lKj},9lKj,x), 

where is the only unknown component in t. We compute the symbolic 

memory 9 from analysis of the second program part P%. The cycle of P2 starts 
at x. There we add an artificial edge, whose action simulate an effect of return 
from any call of /. The artificial edge gets us to location v. Then we enclose the 
cycle by following a path from v to x. We set x to be the entry location of P2 and 
we further set x to also be the only exit location from P2. As you can see, here we 
have introduced an assumption that there is no branching along the path from 
v to x, i.e. we cannot escape from the path. We discuss the case, when there 
is some branching (escape edges) along the path later. Since we have defined 
the program part Pj, we compute its template t% — (x, 1, true, [], x)}) 

according to algorithm from Section 15.11 Then we take the symbolic memory 
0[/s] and we complete the recursion template t. 

Note that we can simplify computation of of the template ti such that 
we only express a return value of /. We do not need to express local variables of 
/, since requirement (R2) of Definition[S]uses the equivalence =. We further note, 
that the algorithm above also works for indirect recursion. It immediately follows 
from the algorithm in Section[5T] where cyclic path of an analysed program part 
may traverse several functions. 
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t : =countIf ( 
A,i+l,n,x) 



A- 



t : =countIf ( 
A,i+l,n,x) 

J ret O/ 




(a) 



(b) 



Fig. 4. Two equivalent recursive implementations of the function countlf (A, i ,n,x) . 

We finish the section by discussion of the assumption we gave to the cyclic 
path of P2- We assumed there is no branching along the path from v to x. The 
algorithm presented above can compute templates for tail recursions and for 
many non-tail ones, while keeping the computation simple (we only need #[kJ 
expressed just for return value). Therefore, we believe the assumption has only 
small impact to applicability of the algorithm. Besides, it is always possible to 
move edges with recursive calls below branchings not depending on return values 
form the calls. We demonstrate this process at Figures H] (a) and (b), where we 
depict two equivalent recursive implementations of the function countlf. We 
can easily check that in program at Figures HI (b) there are two program parts 
(one per recursive call), for which we can compute templates according to the 
algorithm described above. 

6 Discussion 

We presented compact symbolic execution using only templates with a single 
parameter. We further restrict ourselves to computation of templates only for 
program parts consisting of cyclic paths of representing recursion. We can get 
even better reduction of size of symbolic execution tree, if we create templates 
for more complex program parts, and when we use more parameters. Let us 
consider the function countlf at Figure [H The program loop in the function 
consists of two cyclic paths around it. We have already discussed templates for 
both cycles in Section [5] But if we built a single template using two parameters 
(one parameter per cyclic path), then resulting compact symbolic execution tree 
would be finite. We see, there is a space for extensions of the basic concepts we 
presented here. 

Let us consider well known algorithm binarySearch. Template detection for 
this program (even with a single parameter) may infer geometric progressions 
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as values of some variables. They may later cause serious performance issues for 
SMT solver, when they get into a path condition. 

Compact symbolic execution commonly has higher performance requirements 
to SMT solvers then classic one. Path conditions may contain template param- 
eters besides symbols. And parameters are quantified. This is the price of the 
ability to reason about multiple program paths at once. 

King showed effectiveness of symbolic execution for automated testing gen- 
eration [8]. Producing a good test typically means to reach some interesting 
(e.g. bug suspicious) program location. Compact symbolic execution can be very 
helpful in this task. Let us consider a situation, when reachability of such a tar- 
get location is dependant on an exact number of iterations of a particular cycle. 
Providing a template for a program part with the cycle, we can simultaneously 
reason about all the paths exiting from the cycle. Therefore, instead of explo- 
ration of paths space by classic symbolic execution, we can just send a query to 
SMT solver to check satisfiability of parametrised path condition. 

King also showed in his paper |5j, how symbolic execution can be used in 
proving program correctness according to Floyd's method [3J. Using templates 
we can decrease or in some cases even eliminate the need of loop invariants. 
For programs, where compact symbolic execution is finite in contrast to classic 
one, there we do not need loop invariants at all. And for other programs, loop 
templates describe behaviour of some paths through loop, and we may therefore 
provide simpler invariants for the remaining behaviour of the loop. 

7 Related Work 

Compact symbolic execution is tightly related to the work of King in 1976 [8], 
where the author introduced the general concept of classic symbolic execution. 
Besides the description of symbolic execution King discussed its applicabil- 
ity to program testing and formal proving of correctness according to Floyd's 
method [3J. Nevertheless, issues like the path explosion problem were not tackled. 

In [B] authors propose a program instrumentation by a code providing lazy 
initialisation of dynamically allocated data structures like lists or trees and they 
enable symbolic execution of the instrumented program by a standard model 
checker without building a dedicated tool. The lazy initialisation algorithm is 
further improved and formally defined as an operational semantics of a core 
subset of the Java Virtual Machine in [2] . 

A scalability of symbolic execution to real world programs can be improved by 
exploring only client's code [7J. A library code (like string manipulation, standard 
containers like sets or maps) can be assumed as well defined and properly tested. 

There are several symbolic execution based techniques constructing loop sum- 
maries or simply counting loop iterations |5lllll2j . The introduction of counters 
usually provides a possibility to speak about multiple paths through loop at once. 
A technique presented in [5] analyses loops on-the-fly, i.e. during simultaneous 
concrete and symbolic execution of a program for a concrete input. The loop 
analysis infers inductive variables, i.e. variables that are modified by a constant 
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value in each loop iteration. These variables are used to build loop summaries 
expressed in a form of pre and postconditions. The LESE technique presented 
in introduces symbolic variables for the number of times each loop was ex- 
ecuted. LESE links the symbolic variables with features of a known grammar 
generating inputs. Using these links, the grammar can control the numbers of 
loop iterations performed on a generated input. A symbolic-execution-based al- 
gorithm in [12] produces a nontrivial necessary condition on input values to drive 
the program execution to the given location. The key part of the technique is 
computation of loop summaries in form of symbolic program states and path 
conditions both parametrised by so called path counters. Each path counter is 
assigned to individual path through the analysed loop. 

There are also approaches computing function summaries [411] . Reusing sum- 
maries at call sites typically leads to an interesting performance improvement. 
Moreover, summaries may insert additional symbolic values into a path condition 
which often leads to another performance improvement. 

Finally, there are also techniques partitioning program paths into separate 
classes according to impact of the paths to a given set of program variables [9110] . 
Values of output variables are typically considered as a partitioning criteria. 

8 Conclusion 

We introduced a generalisation of classic symbolic execution called compact sym- 
bolic execution. We generalised notion of symbols of classic symbolic execution 
such that symbols can be related to different program locations now. This al- 
lows us to analyse individual parts of a given program separately from the rest of 
the program. We further introduced concept of templates representing declara- 
tive parametric descriptions of behaviour of separately analysed program parts. 
We gave precise definition of templates with one parameter and we provided 
algorithm of compact symbolic execution using these templates. 
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